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Abstract 

Markov logic uses weighted formulas to com¬ 
pactly encode a probability distribution over pos¬ 
sible worlds. Despite the use of logical formu¬ 
las, Markov logic networks (MLNs) can be diffi¬ 
cult to interpret, due to the often counter-intuitive 
meaning of their weights. To address this issue, 
we propose a method to construct a possibilis¬ 
tic logic theory that exactly captures what can 
be derived from a given MLN using maximum 
a posteriori (MAP) inference. Unfortunately, the 
size of this theory is exponential in general. We 
therefore also propose two methods which can 
derive compact theories that still capture MAP 
inference, but only for specific types of evidence. 
These theories can be used, among others, to 
make explicit the hidden assumptions underlying 
an MLN or to explain the predictions it makes. 


1 INTRODUCTION 

Markov logic ll22ll and possibilistic logic ||9l are two pop¬ 
ular logics for modelling uncertain beliefs. Both logics 
share a number of important characteristics. At the syn¬ 
tactic level, formulas correspond to pairs {a, A), consisting 
of a classical formula a and a certainty weight A, while at 
the semantic level, sets of these formulas induce a mapping 
from possible worlds to [0,1], encoding the relative plausi¬ 
bility of each possible world. 

Despite their close similarities, however, Markov logic and 
possibilistic logic have been developed in different commu¬ 
nities and for different purposes; Markov logic has mainly 
been studied in a machine learning context whereas possi¬ 
bilistic logic has been studied as a knowledge representa¬ 
tion language. This reflects the complementary strengths 
and weaknesses of these logics. On the one hand, the qual¬ 
itative nature of possibilistic logic makes it challenging to 
use for learning; although a few interesting approaches for 


learning possibilistic logic theories from data have been ex¬ 
plored (e.g. ESI), their impact on applications to date has 
been limited. On the other hand, the intuitive meaning of 
Markov logic theories is often difficult to grasp, which lim¬ 
its the potential of Markov logic for knowledge representa¬ 
tion. The main culprit is that the meaning of a theory can 
often not be understood by looking at the individual formu¬ 
las in isolation. This issue, among others, has been high¬ 
lighted in 1261, where coherence measures are proposed 
that evaluate to what extent the formulation of a Markov 
logic theory is misleading. 

Example 1. Consider the following Markov logic formu¬ 
las: 

-foo : antarctic-bird{X) — >■ bird{X) 

10 : bird{X) ^flies{X) 

5 : antarctic-bird{X) — >■ -flies{X) 

While the last formula might appear to suggest that antarc¬ 
tic birds cannot fly, in combination with the other two for¬ 
mulas, it merely states that antarctic birds are less likely to 
fly than birds in general. 

Possibilistic logic is based on a purely qualitative, compar¬ 
ative model of uncertainty: while a Markov logic theory 
compactly encodes a probability distribution over the set of 
possible worlds, a possibilistic logic theory merely encodes 
a ranking of these possible worlds. Even though a proba¬ 
bility distribution offers a much richer uncertainty model, 
many applications of Markov logic are based on MAP in¬ 
ference, which only relies on the ranking induced by the 
probability distribution. 

In this paper, we first show how to construct a possibilistic 
logic theory 0, given a Markov logic theory M., such that 
the conclusions that we can infer from 0 are exactly those 
conclusions that we can obtain from fA using MAP infer¬ 
ence. Our construction can be seen as the syntactic counter¬ 
part of the probability-possibility transformation from cni. 
In principle, it allows us to combine the best of both worlds, 
using M for making predictions while using 0 for eluci¬ 
dating the knowledge that is captured by M (e.g. to verify 





that the theory A4 is sensible). However, the size of 0 
can be exponential in the size of A4, which is unsurprising 
given that the computational complexity of MAP inference 
is higher than the complexity of inference in possibilistic 
logic. To overcome this problem, we begin by studying 
ground (i.e. propositional) theories and propose two novel 
approaches for transforming a ground MLN into a compact 
ground possibilistic logic theory that still correctly captures 
MAP inference, but only for specific types of evidence (e.g. 
sets of at most k literals). Then we lift one of these ap¬ 
proaches such that it can transform a first-order MLN into 
a first-order possibilistic logic theory. Finally, we present 
several examples that illustrate how the transformation pro¬ 
cess can be used to help identify unintended consequences 
of a given MLN, and more generally, to better understand 
its behaviour. 

The remainder of the paper is structured as follows. In 
the next section, we provide some background on Markov 
logic and possibilistic logic. In Section [3 we analyse the 
relation between MAP inference in ground Markov logic 
networks and possibilistic logic inference, introducing in 
particular two methods for deriving compact theories. Sec¬ 
tion |4] then discusses how we can exploit the symmetries 
in the case of an ungrounded Markov logic network, while 
Section |5]provides some illustrative examples. Finally, we 
provide an overview of related work in Section |6] 

Due to space limitations, some of the proofs have been 
omitted from this paper. These proofs can be found in an 
online appendixQ 

2 BACKGROUND 

2.1 MARKOV LOGIC 

A Markov logic network (MLN) ll22l is a set of pairs 
{F, wf), where f is a formula in first-order logic and 
a real number, intuitively reflecting a penalty that is applied 
to possible worlds (i.e. logical interpretations) that violate 
F. In examples, we will also use the notation wp ■ F to 
denote the formula {F, wp)- An MLN serves as a template 
for constructing a propositional Markov network. In par¬ 
ticular, given a set of constants C, an MLN A4 induces the 
following probability distribution on possible worlds ui: 

PAt(w) = ^exp I wpnpiuj) j , (1) 

\{F,wp)&M J 

where np^x) is the number of true groundings of F in the 
possible world w, and Z is a normalization constant to en¬ 
sure that Pm can be interpreted as a probability distribu¬ 
tion. Sometimes, formulas {F,wf) are considered where 
wp = -foo, to represent hard constraints. In such cases, 
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we define pM{to) = 0 for all possible worlds that do not 
satisfy all of the hard constraints, and only formulas with 
a real-valued weight are considered in ([1]) for the possible 
worlds that do. Note that a Markov logic network can be 
seen as a weighted set of propositional formulas, which are 
obtained by grounding the formulas in A4 w.r.t. the set of 
constants C in the usual way. In the particular case that all 
formulas in A1 are already grounded, A1 corresponds to a 
theory in penalty logic ina. 

One common inference task in MLNs is full MAP infer¬ 
ence. In this setting, given a set of ground literals (the ev¬ 
idence) the goal is to compute the most probable configu¬ 
ration of all unobserved variables (the queries). Two stan¬ 
dard approaches for performing MAP inference in MLNs 
are to employ a strategy based on MaxWalkSAT ll22l or to 
use a cutting plane based strategy EHIIsl. Given a set of 
ground formulas E, we write max(Al,£’) for the set of 
most probable worlds of the MLN that satisfy E. For each 
oj G max(Al,£'), tVFnF{uj) evaluates to the 

same value, which we will refer to as sat{M,E). We de¬ 
fine the penalty pen{A4, E) of E as follows: 

pen{M, E) = sat{M., 0) — sat{Ai,E) 

We will sometimes identify possible worlds with the set of 
literals they make true, writing pen{Ai,uj). We will also 
write pen{A4, a), with a a ground formula, as a shorthand 
for pen{Ai, {a}). We will consider the following inference 
relation, which has been considered among others in lITSl : 

(M,E) \-map ct iff Vw G max(Ad, E) : uj \= a (2) 

with Ai an MLN, a a ground formula and E a set of ground 
formulas. It can be shown that checking (Af, E) \-map ot 
for a ground network Af is -completq^ El- 

2.2 POSSIBILISTIC LOGIC 

A possibility distribution in a universe H is a mapping tt 
from SI to [0,1], encoding our knowledge about the pos¬ 
sible values that a given variable X can take; throughout 
this paper, we will assume that all universes are finite. For 
each a; G S2, 7r(a;) is called the possibility degree of x. 
By convention, in a state of complete ignorance, we have 
7r(x) = 1 for all x G S7; conversely, if X = Xq is known, 
we have tt{xo) = 1 and 7r(x) = 0 for x ^ xq. Possibility 
theory ll27l [TJ] is based on the possibility measure H and 
dual necessity measure N, induced by a possibility distri¬ 
bution TT as follows {A C H): 

n(A) = max7r(a) 

a^A 

N{A) = 1 - n(H \ A) 

^The complexity class A2 contains those decision problems 
that can be solved in polynomial time on a deterministic Turing 
machine with access to an NP oracle. 




Intuitively, n(A) is the degree to which available evidence 
is compatible with the view that X belongs to A, whereas 
N (A) is the degree to which available evidence implies that 
X belongs to A, i.e. the degree to which it is certain that X 
belongs to A. 

A theory in possibilistic logic 0 is a set of formulas of 
the form (a, A), where a is a propositional formula and 
As [0,1] is a certainty weight. A possibility distribution tt 
satishes (a, A) iff A^(|a]) > A, with N the necessity mea¬ 
sure induced by tt and |a] the set of propositional mod¬ 
els of a. We say that a possibilistic logic theory 0 entails 
(a, A), written 0 |= (a, A), if every possibility distribution 
which satishes all the formulas in 0 also satishes (a, A). 
A possibility distribution tti is called less specihc than a 
possibility distribution 112 if 7ri(a;) > 7r2(w) for every oj. 
It can be shown that the set of models of 0 always has a 
least element w.r.t. the minimal specihcity ordering, which 
is called the least specihc model tt* of 0. It is easy to see 
that 0 ^ (a, A) iff tt* satishes (a, A). 

Even though the semantics of possibilistic logic is dehned 
at the propositional level, we will also use hrst-order for¬ 
mulas such as {p{X) —>■ q{X, Y), A) throughout the paper. 
As in Markov logic, we will interpret these formulas as ab¬ 
breviations for a set of propositional formulas, obtained us¬ 
ing grounding in the usual way. In particular, we will al¬ 
ways assume that hrst-order formulas are dehned w.r.t. a 
hnite set of constants. 

The A-cut 0^ of a possibilistic logic theory 0 is dehned as 
follows: 

0A = {a I (a, p) e 0, > A} 

It can be shown that 0 |= {a, A) iff 0 a \= a, which means 
that inference in possibilistic logic can straightforwardly be 
implemented using a SAT solver. 

An inconsistency-tolerant inference relation hposs for pos¬ 
sibilistic logic can be dehned as follows: 

0 CK iff 0co«(©) ^ CK 

where the consistency level con(0) of 0 is the lowest cer¬ 
tainty level A for which 0 a is satishable (among the cer¬ 
tainty levels that occur in 0). Note that all formulas with a 
certainty level below con(0) are ignored, even if they are 
unrelated to any inconsistency in 0. This observation is 
known as the drowning effect. 

We will write (0, E) hposs a, with E a set of propositional 
formulas, as an abbreviation for 0 U {(e, 1) | e S E} hposs 
a. Despite its conceptual simplicity, hposs has many de¬ 
sirable properties. Among others, it is closely related to 
AGM belief revision lH) and default reasoning 10. It can 
be shown that checking 0 hposs (a, A) is a 0^ complete 
problerrOI iflTl . In this paper, hposs will allow us to capture 
the non-monotonicity of MAP inference. 

^The complexity class 0^ contains those decision problems 


Example 2. Let 0 consist of the following formulas: 

{penguin{X) — bird{X), 1) 

(j)enguin(X) —^ -fiies(X), 1) 

[bird{X) —i" fiies{X),0.5) 

Then we find: 

(0, {birdfweety)}) \-poss flies{tweety) 

{Q, {bird{tweety),penguin{tweety)}) hposs -fiies{tweety) 

In general, hposs allows us to model rules with exceptions, 
by ensuring that rules about specihc contexts have a higher 
certainty weight than rules about general contexts. 

3 ENCODING GROUND NETWORKS 

Throughout this section, we will assume that Ad is a ground 
MEN in which all the weights are strictly positive. This can 
always be guaranteed for ground MLNs by replacing for¬ 
mulas (a, A) with A < 0 by (-la, —A), and by discarding 
any formula whose weight is 0. Eor a subset X C Ad, 
we write X* for the set of corresponding classical for¬ 
mulas, e.g. for X = {{Fi,wi), ...,{En,Wn)} we have 
X* = {El,En}. In particular. Ad* are the classical 
formulas appearing in the MEN Ad. 

The following transformation constructs a possibilistic 
logic theory that is in some sense equivalent to a given 
MEN. It is inspired by the probability-possiblity transfor¬ 
mation from ifTOll . 

Transformation 1. We define the possibilistic logic theory 
Qm corresponding to an MLN M. as follows: 

{(V -Y*, V I ^ ^ V > 0} (3) 

where for a propositional formula a: 

K+pen{M,a) satisfies the hard constraints 

1 otherwise 

and the constants K and L are chosen such that 0 = 
4>iT) < fiiot) < Ifor every a that satisfies the hard con¬ 
straints (i.e. the formulas with weight -|-ooj. 

In the following we will use the notations 4){u}) for a pos¬ 
sible world uj and (j){E) for a set of formulas E, dehned 
entirely analogously. Throughout the paper we will also 
write {—K < x < L — K): 

_ K X 
“ L 

The correctness of Transformation[T]follows from the next 
proposition, which is easy to show. 

that can be solved in polynomial time on a deterministic Turing 
machine, by making at most a logaritmic number of calls to an 
NP oracle. 






Proposition 1. Let A4 be a ground MLN and 0 m the cor¬ 
responding possibilistic logic theory. Let tt be the least spe¬ 
cific model of0M- It holds that: 

7r(uj) = 1 — 0(w) 

Corollary 1. Let A4 be a ground MLN and 0 m the corre¬ 
sponding possibilistic logic theory. It holds that for X < 1; 

0M 1= («; '^) iff pen{Ai, -la) > XL — K 

and 

0M \= (a, 1) iff pen{M,^a) = +cxd 

Corollary 2. Let M be a ground MLN and 0 m the corre¬ 
sponding possibilistic logic theory. For pM the probability 
distribution induced by A4 and tt the least specific model 
of0M> it holds that 

> Pm{^ 2) iff 7r(a;i) > 7r(a;2) 

for all possible worlds wi and UJ 2 - In particular, it fol¬ 
lows that for every propositional formula a and every set 
of propositional formulas E: 

{M,E)\-MApOi iff {0,E) 

poss (4) 

Example 3. Consider the MLN A4 containing the follow¬ 
ing formulas: 

5 : a ^ X 5 : a ^ y 10 : a Ab ^ -<y 

Then 0 m contains the following formulas: 

As : a —>■ a: X5 : a ^ y 

Aio : a Ab ^ -<y Aio : a ^ xW y 

Ais : a A 6 —>■ a: V -ly 

It can be verified that: 

(0M,{a}) ^poss X f\ y (0Ai,{a,6}) poss X A ~~'y 

An important drawback of the transformation to possibilis¬ 
tic logic is that the number of formulas in 0m is exponen¬ 
tial in \ fA\. This makes the transformation inefficient, and 
moreover limits the interpretability of the possibilistic logic 
theory. In general, the exponential size of 0m cannot be 
avoided if we want (|4]i to hold for any E and a. However, 
more compact theories can be found if we focus on spe- 
cihc types of evidence. Sections [3 .1 1 and [T2l introduce two 
practical methods to accomplish this. 

3.1 SELECTIVELY AVOIDING DROWNING 

In many applications, we are only interested in particu¬ 
lar types of evidence sets E. For example, we may only 


be interested in evidence sets that contain at most k lit¬ 
erals, or in evidence sets that only contain positive liter¬ 
als. In such cases, we can often derive a more compact 
possibilistic logic theory 0^ as follows. Let 8 be the 
set of evidence sets that we wish to consider, where each 
E G f is a set of ground formulas. Given E G £ 'we 
write Se for the set of all minimal subsets {Fi,..., Ei} of 
■^E = I ^ ^ M*,pen{M,^F) < pen{M, E)} s.t. 

pen{A4, ^E A -iFi A ... A -^Ei) > pen{fi4, E) ( 5 ) 

The following transformation constructs a possibilistic 
logic theory that correctly captures MAP inference for ev¬ 
idence sets in 8. The basic intuition is that we want to 
weaken the formulas in M.* just enough to ensure that 
the resulting certainty level prevents them from drowning 
when the evidence E becomes available. 

Transformation 2. Given a ground MLN A4 and a set of 
evidence sets 8, we define the possibilistic logic theory 0 m 
as follows: 

{{E,,fihFi))\EiGM*} (6) 

Ii{{^/\EV\/Z,cl>{/\EA^/\Z))\ZGSE, (7) 
EG8}U{{^f\E,fi/\E))\EG8} (8) 

If Ai is clear from the context, we will omit the subscript in 
0^. The formulas in (|6]l are the direct counterpart of the 
MLN. Intuitively, there are two reasons why these formulas 
are not sufficient. First, due to the drowning effect, formu¬ 
las F such that pen (Ad, -^F) < pen{AA, E) will be ignored 
under the evidence E. In such cases we should look at min¬ 
imal ways to weaken these formulas such that the certainty 
level of the resulting formula is sufficient to avoid drown¬ 
ing under the evidence E. This is accomplished by adding 
the formulas in (|7]i. Second, as 0^ contains less informa¬ 
tion than 0M^ we need to ensure that the consistency level 
for 0^ is never lower than the consistency level for 0^^, 
given an evidence set E G 8. To this end, 0^ includes 
the formulas in The following example illustrates why 
these formulas are needed. 

Example 4. Consider the following MLN A4: 

3 : u 2 : a 10 : (a V 6) A (u V u) —>■ ^x 
2:6 1 : u 

and let £ = {{x}}, i.e. the only evidence set in which we 
are interested is {x}. It holds that 

5 b = {{a,u},{6,u},{a,'!;},{6,'!;}} ( 9 ) 

and 0 ^ = 0 U 'k U F, where: 

0 = {(u, A3), (a, A2), ((a V 6) A (u V u) ->• -ix, Aio), 

(6, A2), (x, Ai)} 

= {(a V u V -ix, Ag), (6 V It V -ix, Ag), 

(a V u V -ix. As), (by vM -ix, A5) 

F = {(-.X, A4)} 


It is easy to verify that (0 U {x}) hposs u whereas 
(M, {x}) \/map u and (0 U U F, {x}) fposs u. 

We now prove the correctness of Transformation|2] 

Proposition 2. For any formula a and any evidence set 
E £ 8, it holds that (0 m j E) hpogs a ijf{0^, E) hpogs a. 

Proof Let us introduce the following notation; 

\e = con(0M U {(e, 1) | e G E}) 

= con{0^ U {(e, 1) I e G E}) 

A = (0M U {(e, 1) I e G E'\)\j^ 

Gl^ = (0^U{(e,l)|eG £;}),! 

We need to show that A is equivalent to , for any E £ 6. 

By Corollary [T] we know that every formula (a, A) in 
0^ is entailed by 0m, hence < Xe- Since Xe 
is the smallest certainty level from 0^ which is strictly 
higher than (j)(E), it follows that A^ contains every for¬ 
mula which appears in 0^ with a weight that is strictly 
higher than (j)(E). Moreover, since 0^ by construction 
contains (-■ /\ E, (f>(/\ E)), we find that A^ can only con¬ 
tain such formulas: 

A^ = E£i{a\(a,X) £0^,X>(I)(E)} (10) 

It follows that A j= A^. 

Let Gi V ... V Gs be a formula from A. From Corollary [T] 
we know that: 

pen(A4, -iGi A ... A -^Gs) > pen(M, E) 
and a fortiori 

pen(A4, E A -iGi A ... A ^Gg) > pen(Ai, E) 

This means that for any formula Gi V ... V G^ in A, either 
pen(A4,^Gi) > pen(A4, E) for some i or Se contains 
a subset {Hi ,..., Hr} of {Gi,..., G^}. Then 0^ contains 
either Gi or the formula -^E V Hi... V Hr with a weight 
which is strictly higher than (j)(E) and thus either Gi or 
-iG V Hi... V Hr belongs to A^. In both cases we find 
A^ ^ Gi V ... V Gs. We conclude A^ A. □ 

An alternative, which would make the approach in this sec¬ 
tion closer to the standard encoding in ([T]i, is to define 5^ 
as the set of minimal subsets {Fi,..., Ei] of such that 

pen{M,^Fi A ... A ^Ei) > pen{A4, E) (11) 

and then replace the formulas in (l7]l by 

{i\/Z,fi^/\Z))\Z£S'E} (12) 

The advantage of 0, however, is that we can expect many 
of the sets in Se to be singletons. To see why this is 


the case, first note that for each world w in max(Ad, E), 
the set of formulas y C Ai* satisfied by w is such 
that pen{A4,^\J(Ad* \ 3^)) is minimal among all sets 
y £ Ad* for which i? A /\3^' is consistent. Let us write 
ConsE(Ad) for the set of all these maximally consistent 
subsets of Ad*. Note that max(Ad, E) = I ^ ^ 

ConsE(Ad)}\. 

Lemma 1. For a set of formulas {Fi,..., Fi} C Ad* it 
holds that pen(Ad, E A ^Fi A ... A ^Fi) > pen(Ad, E) ijf 
{fi,..., El} ny f ih for every y in ConsE{Ad). 

Corollary 3. Let ConsE(Ad) = {3’i,..., 3’s}. It 

holds that Se consists of the subset-minimal elements of 
{{yi,-,ys}\yi £yi0Ad*E,...,ys gA^sCiXI;}. 
Example 5. Consider again the MLN Ad from Example^ 
and let E = {x}. It holds that Cons e(A d) = {3’i,3^2}> 
where 

3^1 = {(a V &) A (u V u) —>■ -ix, a, b} 

3^2 = {(a V &) A (u V u) —>■ -ix, u, w} 

Ad*E = {a, b, u, u} 

From Corollary\^it follows that Se is given by ®. 

In practice. Cons e (Ad) will often contain a single element, 
in which case all the elements of Se will be singletons. 

3.2 MAP INFERENCE AS DEFAULT REASONING 

A large number of approaches has been proposed for rea¬ 
soning with a set of default rules of the form “if a then 
typically j3’ ’ uniiiiiiii. At the core, each of the proposed 
semantics corresponds to the intuition that a set of default 
rules imposes a preference order on possible worlds, where 
“if a then /3” means that /? is true in the most preferred 
models of a. The approaches from ifTsl and llT^ can be 
elegantly captured in possibilistic logic El, by interpreting 
the default rule as the constraint n(Q! A/3) > n(aA-'/3). In 
Markov logic, the same constraint on the ordering of pos¬ 
sible worlds can be expressed by imposing the constraint 
(Ad, a) \-map P- In other words, we can view the MAP 
consequences of an MLN as a set of default rules, and en¬ 
code these default rules in possibilistic logic. The follow¬ 
ing transformation is based on this idea. 

Transformation 3. Given a ground MLN Ad and a posi¬ 
tive integer k, we construct a possibilistic logic theory 0 m 
as follows: 

• For each hard rule F from Ad, add (F, 1) to 0^. 

• For each set of literals E such that 0 < \E\ < k, let 
A = {x I (Ad, E) Lmap x} be the set of literals that 
are true in all the most plausible models of E. Unless 
there is a literal y £ E such that /\(E\ {?/}) \~map y, 
add 

[AE^hX.y 


to 0^, where Xe = (j>{/\E). If pen{M.,E) > 
pen{A4, 0), add also 

{^{/\Ea/\X),X'e) (13) 

where X'e E the certainty level just below Xe in 
i.e. Xe' = maxjAi;’ | Xe < Xe, < k}. 

If Ai is clear from the context, we will omit the subscript 
in The possibilistic encoding of default rules used 

in Transformation [3 is similar in spirit to the method from 
El, which is based on the Z-ranking from ESI. However, 
because already provides us with a model of the default 
rules, we can directly encode default rules in possibilistic 
logic, without having to rely on the Z-ranking. Also note 
that although the method is described in terms of an MLN, 
it can be used for encoding any ranking on possible worlds 
(assuming a hnite set of atoms). 

As illustrated in the following example, (fOT l is needed to 
avoid deriving too much, serving a similar purpose to (H} 
in the approach from Section lrTI 

Example 6. Consider the following MLN M.: 

2 : -la Mb 2 : aM b 1 : aM -'b 
Then 0^ = 0 U T', where 

0 = {(T a Ab, Aq), {~'a —^ &, Ai), {->b -A T , A 2 )} 

T' = {{b, Ai), (a V -'6, Ao)} 

We find (0, {“'&}) kposs a while {M., {“'&}) if map «■ Ac¬ 
cordingly, we have (0 U T', {“■&}) if pass «■ 

Transformations 2 and 3 have complementary strengths. 
For example. Transformation 2 may lead to more compact 
theories for relatively simple MLNs, e.g. if for most of the 
considered evidence sets, there is a unique set of formulas 
from the MLN that characterizes the most probable models 
of the evidence (cf. Lemma[T]i. On the other hand. Trans¬ 
formation 3 may lead to substantially more compact theo¬ 
ries in cases where the number of formulas is large relative 
to the number of atoms. 

We now show the correctness of Transformation |3 

Proposition 3. Let A4 be an MLN, k a positive integer 
and 0^ the proposed possibilistic logic encoding of A4. 
Furthermore, let E and C be sets of literals such that |i?| -f 
\C\ < k 1. It holds that (A4, E) \-map V ^ if and only if 

(0^^;) v^^- 

Before we prove Proposition [3 we present a number of 
lemmas. In the lemmas and proofs below, M will always 
be an MLN, 0^ will be the corresponding possibilistic 
logic theory and k will be the maximum size of the evi¬ 
dence sets considered in the translation. 


Lemma 2. IfE is a set of literals, \E\ < k, X = 4>{E) and 
{Ai, E) Lmap X then 

A^) \E'\<\E\]h /\E^x 

Lemma 3. If(j){uj) < X then lo is a model of<d\. 

Proof If there were a formula E = {l\E) -A- (/\ X) in 
0^ that was not satisfied by w, then its body would have to 
be true in w but then necessarily 

A < 4>{E) < 4>{uj) < X. 

The first inequality follows from the fact that, by the con¬ 
struction of 0^, if the certainty weight of E is at least A 
then it must be the case that 4>{E) > X. The second in¬ 
equality follows from the fact that w was assumed to be a 
model of /\ E. It follows that; 

pen{A4,uj) = pen{Ai, E). 

However, this would mean that uj is also a most probable 
world of (Ai, E), but then uj \= F by construction of 0^. 

If there were an unsatisfied formula E — {/\E A /\X) 

in 0^ then by construction we would have cj){E U X) > X. 
However, from uj |= /\E A /\X we find U AT) < 
4>{uj) < A, a contradiction. 

Since all formulas in 0^ are of the two considered types, it 
follows that all formulas from 0^ whose certainty weight 
is at least A must be satisfied in uj. □ 

Lemma 4. If {A4, E) \-map {yi M ■ ■ ■ M ym) then 

(i) for any i, either {Ai,E U {^y^}) Lmap (2/1 V • • • V 
yt-i V yi+i V • • • V y^n) or {M, E) Pmap Vi, 

(ii) there exist a j and a set {y[,... ,y{^,} C 

{yi,...,ym} \ {yj} such that {M,E U 

■ ■ ■ ,^y'm’}) i~MApyj- 

Lemma 5. If \C\ -f |i?| < k -\- 1, and X = 4'{E) then 
E \- \J C if and only if{Ai,E) Lmap V 

We now turn to the proof of Proposition[3 

Proof of Proposition]^ Let E be an evidence set such that 
\E\ < k and let A = (j){E). Given Lemma|3 it is sufficient 
to show that con{Q^, E) = X. It follows from Lemma |3 
that con(0^, i?) < A. Let X = {x\{AA,E) Pmap tc} be 
the set of literals which can be derived from (Ai, E) using 
MAP inference. By construction, 0*^ contains a formula 
-^{/\E A /\X) with a certainty weight which is just below 
A. Specifically, for A' < A we either have 0* = 0^, or 
®A' H A from which we hnd con{C>^, E) = A. □ 


It is of interest to remove any formulas in 0^ that are redun¬ 
dant, among others because this is likely to make the theory 
easier to interpret. Although we can use possibilistic logic 
inference to identify redundant formulas, in some cases we 
can avoid adding the redundant formulas altogether. For 
example, in the transformation procedure, we do not add 
any rules for E if it holds that E \ {y} \-map y for some 
y a E. This pruning rule is the counterpart of the cau¬ 
tious monotonicity property, which is well-known in the 
context of default reasoning ES). Any ranking on possi¬ 
ble worlds also satisfies the stronger rational monotonicity 
property, which translated to our setting states that when 
{M, E \ {y}) Fmap X and (M, E \ {y}) \/map it holds 
that {Ai,E) \-map x. Accordingly, when processing the 
evidence set E in the transformation procedure, instead of 
{/\E /\ X, Ab) it is sufficient to add the following rule: 

where 

Xo = {x\E\ {y} \-map X and E \ {y} [/map “'2/} 

The correctness of this pruning step follows from the fol¬ 
lowing proposition. 

Proposition 4. Let x and y be literals. If \E\ < k, 
{A4, E) Lmap X and (A4, E) [/map ~^y then: 

\ h {f\Et\y^x, XEu{y}) 

where F is the formula in 0^ corresponding to the evidence 
set E U {y}, i.e.: 

F = f\(Eyj {y}) ^ /\{a: I (M, i? U {y}) 1 “MAP 

Proof If {M,E) [-map x and {M,E) [/map ~^y then 
{M.,E U {y}) [-map X and pen{M.^ E) = pen{M.,E U 
{y}) = pen{Ai,E U {a:,y}). Therefore using Lemma|2] 
we find that 0^ [- A E ^ x. From Lemma |2] it 

furthermore follows that A^ ^ x can be derived from 
rules with antecedents of length at most \E\. In particular, 
we find that /\ i? —a; can be derived without using the 
formula AE Ay A^- ^ 

Finally, note that formulas of the form (fOT l can be omitted 
when some y € E. Indeed, in such 

a case we find from pen{Ai, E \ {y}) < pen{Ai, E) that 
{Ai, E \ {y}) [-map -^y, hence (fT^ will be entailed by a 
formula of the form ( A{E \ {y}) —>• /\ X,XE\{y}) in 0^. 

4 ENCODING NON-GROUND 
NETWORKS 

We now provide the counterpart to the construction from 
Section [52] for non-ground MLNs. The first-order nature 


of MLNs often leads to distributions with many symmetries 
which can be exploited by lifted inference methods ll20ll . 
We can similarly exploit these symmetries for constructing 
more compact possibilistic logic theories from MLNs. 

For convenience, in the possibilistic logic theories, we will 
use typed formulas. For instance, when we have the for¬ 
mula a = owns{person : X, thing : Y) and the set of 
constants of the type person is {alice, bob} and the set of 
constants of the type thing is {car} then a corresponds to 
the ground formulas owns{alice, car) and owns{bob, car). 
In cases where there is only one type, we will not write it 
explicitly. 

Two typed formulas Fi and F 2 are said to be isomorphic 
when there is a type-respecting substitution 9 of the vari¬ 
ables of El such that Fi9 = F 2 (where = denotes equiv¬ 
alence of logical formulas). Two MLNs A4i and Ad 2 are 
said to be isomorphic, denoted by Adi ~ Ad2, if there is a 
bijection i from formulas of Adi to formulas of Ad 2 such 
that for i{F,w) = {F',w') it holds that w = w' and the 
formulas F and F' are isomorphic. When j is a permuta¬ 
tion of a subset of constants from Ad then j{M) denotes 
the MLN obtained by replacing any constant c from the 
subset by its image j(c). 

Given a non-ground MLN Ad, we can first identify sets of 
constants which are interchangeable, where a set of con¬ 
stants Ct is said to be interchangeable if j(Ad) Ri Ad for 
any permutation j of the constants in Cf. Note that to check 
whether a set of constants C* is interchangeable, it is suf¬ 
ficient to check that j(Ad) Ri Ad for those permutations 
which swap just two constants from Cf. For every max¬ 
imal set Ct of interchangeable constants, we introduce a 
new type t. For a constant c, we write r(c) to denote its 
type. When F is a ground formula, variabilize{F) denotes 
the following formula: 

\c,d G const{F), t(c) = r(fi)} ^ F' 

where const{F) is the set of constants appearing in F and 
F' is obtained from F by replacing all constants c by a new 
variable I 4 of type r(c). 

Transformation 4. Given an MLN Ad and a positive in¬ 
teger k, we construct a possibilistic logic theory 0^ as 
follows: 

• For each hard rule F from Ad, add {F, 1 ) to 

• For each set of literals E such that 0 < |F| < k, 
let X = {x\ (Ad, E) [-map a^}- For all x G X, unless 
there is a literal y G E such that (Ad, F\{y}) Pmap y 
and unless 0^ already contains a formula isomor¬ 
phic to variabilize (/\ F ^ x), add 

(^ariabilize {^f \ E ^ ^ , A^:^ 


to 0%i- Ifpen(A4,E) > pen{Ad,%) and 0^ 
does not already contain a formula isomorphic to 
variabilize (-i (/\ -E A /\ X)), add also 

(^ariabilize E A (14) 

where is the certainty level just below Xe in Q%t- 

As before, we will usually omit the subscript in 0^. We 
can show that after grounding, 0^ is equivalent to the the¬ 
ory that would be obtained by first grounding the MLN and 
then applying the method from Section [X2l The correct¬ 
ness proof is provided in the online appendix. 

Our implementatior0 of Transformation|4]relies on an effi¬ 
cient implementation of inference in possibilistic logic and 
Markov logic, efficient generation of non-redundant candi¬ 
date evidence sets and efficient filtering of isomorphic for¬ 
mulas. For MAP inference in MLNs, we used a cutting- 
plane inference algorithm based on a SAT-based optimiza¬ 
tion. For inference in possibilistic logic, we also used 
cutting-plane inference in order to avoid having to ground 
the whole theory. To find the ground rules that need to 
be added by the cutting-plane method, we used a modified 
querying system from ifT^ . For solving and optimizing the 
resulting ground programs, we used the SAT4J library 13. 

Note that to check whether 0^ h F, where F is a (not 
necessarily ground) clause, it is sufficient to find one (type- 
respecting) grounding 9 of F, and check whether 0^ U 
{-■(F^)} is inconsistent. In this way, we can check whether 
a rule is implied by 0^ without grounding the whole the¬ 
ory because, as for MLNs, inference in non-ground pos¬ 
sibilistic logic theories can be carried out by cutting-plane 
inference methods. 

We implemented the transformation as a modification of 
the standard best-first search (BFS) algorithm which con¬ 
structs incrementally larger candidate evidence sets, checks 
their MAP consequences and adds the respective rules to 
the possibilistic logic theory being constructed. Like the 
standard BFS algorithm it uses a hash-table based data 
structure closed, in which already processed evidence sets 
are stored. In order to avoid having to check isomorphism 
with every evidence set in closed, each time a new evidence 
set is considered, the stored evidence sets are enriched by 
fingerprints which contain some invariants, guaranteeing 
that no two variabilized evidence sets with different fin¬ 
gerprints are isomorphic. In this way, we can efficiently 
check for a given evidence set F whether there is a previ¬ 
ously generated evidence set F' such that variabilize{E) 
and variabilize{E') are isomorphic. 

As a final remark, we note that for the non-ground 
transformation, it may be preferable to replace any 

''The implementation can be downloaded from: 
https: // github. com/ supertweety/mln2poss. 


rule {variabilize {/\E A/\ X)), X'^) by the rule 
{variabilize {-' f\E), X'^). The reason is that the former 
rules may often become too long in the non-ground case. 
On the other hand, for the ground transformation, the 
advantage of the longer rules is that they will often be 
the same for different sets F, which, in effect, means a 
smaller number of rules in the possibilistic logic theory. 
The correctness of this alternative to Transformation 4 is 
also shown in the online appendix. 

5 ILLUSTRATIVE EXAMPLES 

The first example is a variation on a classical problem from 
non-monotonic reasoning. Here, we want to express that 
birds generally fly, but heavy antarctic birds do not fly, un¬ 
less they have a jet pack. The MLN which we will con¬ 
vert into possibilistic logic contains the following rules: 
10 : bird{X) flies{X), 1 : antarctic{X) —^ -fiies{X), 
10 : heavy{X) —/ —flies{X), 100 : hasJetPack{X) —/ 
flies{X). When presented with this MLN, Transformation 
|4]produces the following possibilistic logic theory. 

{-^antarctic{X) V -flies{X), Aq) 
{-^bird{X) V flies{X), Aq) 
{-^heavy{X) V -flies{X), Aq) 

{flies{X) V -^hasJetPack{X), Ao) 
{-^bird{X) Vflies{X) V hasJetPack{X), Ai) 
{-<heavy{X) V antarctic{X) V -flies{X), Ai) 
{-^bird{X) V -^heavy{X), Ai) 
{-<antarctic{X) V -<heavy{X) V —flies{X), Aio) 
{flies{X) V -^hasJetPack{X) V bird{X), An) 
{-^bird{X) V flies{X) V -^hasJetPack{X), Aioo) 

Let us consider the evidence set F = 
{bird{tweety), heavy{tweety)}. Then the levels Aq 

and Ai drown because of the inconsistency with the rule 
{-^bird{X) V -^heavy{X), Ai) which was produced as one 
of the rules (fT4l i. We can see from the rest of the possibilis¬ 
tic logic theory that unless we add either antarctic{tweety) 
or hasJetPackfweety), we cannot say anything about 
whether tweety flies or not. It can be verified that the same 
is true also for the respective MLN. 

The second example consists of formulas from a classical 
MLN about smokers. There are three predicates in this 
MLN: a binary predicate/(A, F) denoting that A and B 
are friends, and two unary predicates s(A) and c{A) denot¬ 
ing that A smokes and that A has cancer, respectively. The 
MLN contains the following hard rules: —f{A,B) W f{B,A) 
and -f{A,A). In addition, we have two soft rules. The first 
soft rule 10: V -f{A,B) V s{B) states that if A and B 

are friends and A smokes then B is more likely to smoke 
too. The second rule 10: ~‘s{A) V c(A) states that smoking 
increases the likelihood of cancer. The following possi- 



bilistic logic theory was obtained using Transformation |4] 
with k = A. 

{s{B) V -'f{A,B) V -'i(A) V ^alldiff{A,B), Xq) 

(-'s(A) V c(A), Ao) 

(Y(C,B) VY(A,B)V 5(A) V 5(C) 

V~'alldiff{A,B, C) V “'5(B), Aio) 

(Y(C, B) V - 5 (A) V -/(A, C) V 5 (C) 

\/-'alldiff{A,B, C) V ~'s{B), Aio) 
(-5(A) V -/(C, A) V 5(C) V c(B) 

\/-'alldiff{A,B, C) V —5(5), Aio) 
(—5(A) V c(A) V c{B) V — 5 ( 5 ) V -^alldijf{A,B), Aio) 
(5(B) V -'f{A,B) V -5(A) V c(A) V -a/W!j^A,B), Aio) 

(-/(A,B) V/(B,A),1) 
(-/(A,A),1) 

At the lowest level Ao we find the counterparts of the soft 
rules from the MLN, whereas at level 1 we find the hard 
rules. At the intermediate level we intuitively find weak¬ 
ened rules from the MLN. For instance, the rule (—5(A) V 
c(A) Vc(B) V —5(B) V -^alldiff{A,B), Ai) can be interpreted 
as: if A and B smoke then at least one of them has cancer. 
It is quite natural that this rule has higher certainty weight 
than the rule: if A smokes then A has cancer. 

A final, more elaborate example is provided in the online 
appendix. 

6 RELATED WORK 

One line of related work focuses on extracting a compre¬ 
hensible model from another learned model that is difficult 
or impossible to interpret. A seminal work in this area is 
the TREPAN Q algorithm. Given a trained neural network 
and a data set, TREPAN learns a decision tree to mimic the 
predictions of the neural network. In addition to produc¬ 
ing interpretable output, this algorithm was shown to learn 
accurate models that faithfully mimicked the neural net¬ 
work’s predictions. More recent research has focused on 
approximating complex ensemble classifiers with a single 
model. For example, Popovic et al. 11211 proposed a method 
for learning a single decision tree that mimics the predic¬ 
tions of a random forest. 

While, to the best of our knowledge, this is the first paper 
that studies the relation between Markov logic and possi- 
bilistic logic, the links between possibility theory and prob¬ 
ability theory have been widely studied. For example, IH 
has proposed a probability-possibility transformation based 
on the view that a possibility measure corresponds to a par¬ 
ticular family of probability measures. Dempster-Shafer 
evidence theory ll25ll has also been used to provide a prob¬ 
abilistic interpretation to possibility degrees. In particular, 
a possibility distribution can be interpreted as the contour 


function of a mass assignment; see ifTTIl for details. In ifTHl 
it is shown how the probability distribution induced by a 
penalty logic theory corresponds to the contour function of 
a mass assignment, which suggests that it is indeed natu¬ 
ral to interpret this probability distribution as a possibility 
distribution. Several other links between possibility theory 
and probability theory have been discussed in Ibl. 

In this paper, we have mainly focused on MAP inference. 
An interesting question is whether it would be possible to 
construct a (possibilistic) logic base that captures the set 
of accepted beliefs encoded by a probability distribution, 
where A is accepted if P{A) > P(—A). Unfortunately, the 
results in m show that this is only possible for the limited 
class of so-called big-stepped probability distributions. In 
practice, this means that we would have to define a partition 
of the set of possible worlds, such that the probability dis¬ 
tribution over the partition classes is big-stepped, and only 
capture the beliefs that are encoded by the latter, less in¬ 
formative, probability distribution. A similar approach was 
taken in HI to learn default rules from data. 

7 CONCLUSIONS 

This paper has focused on how a Markov logic network Ai 
can be encoded in possibilistic logic. We started from the 
observation that it is always possible to construct a pos¬ 
sibilistic logic theory Qm that is equivalent to Ad, in the 
sense that the probability distribution induced by Ad is iso¬ 
morphic to the possibility distribution induced by Qm- As 
a result, applying possibilistic logic inference to Qm yields 
the same conclusions as applying MAP inference to Ad. 
Although the size of Qm is exponential in the number of 
formulas in Ad, we have shown how more compact theo¬ 
ries can be obtained in cases where we can put restrictions 
on the types of evidence that need to be considered (e.g. 
small sets of literals). 

Our main motivation has been to use possibilistic logic as 
a way to make explicit the assumptions encoded in a given 
MLN. Among others, the possibilistic logic theory could be 
used to generate explanations for predictions made by the 
MLN, to gain insight into the data from which the MLN 
was learned, or to identify errors in the structure or weights 
of the MLN. Taking this last idea one step further, our aim 
for future work is to study methods for repairing a given 
MLN, based on the mistakes that have thus been identified. 
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A PROOFS 

Proof of Proposition\I\ Let X = {{F,w) G M: uj ^ F} 
be the set of rules from the MLN not satisfied in w. By con¬ 
struction, the possibilistic logic theory 0 contains the rule 
(V X*,x) where A = </)(- V X*) = ^ 

K+pen{M.,uj) ^ ^ result, w ^ On the other hand, 

uj ^ 0;y/ for any A' > A. Indeed, for (V L*, A') € 0 and 
A' > A, we have by construction that Y* % X*, i.e. Y* 
contains a formula a which is not in X*. By dehnition of 
X, this means uj \= a, and thus in particular w |= X/Y*. 
Since tt was assumed to be the least specihc model of 0, it 
holds that 7r(a;) = 1 — max(Q,. >,^){Ai|a; ^ ai}. Hence we 
can conclude tt{uj) = 1 — 4>{uj). □ 

Proof of Lemma |2] The lemma can be proved by induction 
on \E\. The base case \E\ = 0 is obvious. Let us as¬ 
sume that the lemma holds for \E\ = n < k. If E does 
not contain any literal y such that {Ai,E \ {y}) Pmap y 
then 0^ contains a formula /\E ^ /\X with x G X 
and the lemma clearly holds. Otherwise, if there is a 
literal y G E such that {M,E \ {y}) Pmap y then 
we must also have {Ai,E \ {y}) Pmap x. Moreover 
4>{E) = (j){E \ {y}) = A in this case. By induction we 
hnd that the formula /\{E\ {y}) —)• x can be derived from 
{(Ai^'^Ax)G0^.t. |l;'|<|£;|}. □ 

Proof of LemmaUl We have that {Ei ,..., Ei} C]y f % for 
every 3^ e ConsE{M) iff V{A3^|3^ G ConsE{M)} A 
-iFi A ... A -iF; is inconsistent. This in turn means that the 
most plausible world of EA^EiA...A^Fi cannot be among 
the most plausible worlds of E, and thus pen{Ai, EA^Ei A 
... A ^Ei) > pen{A4, E). □ 

Proof of Lemma^ (i) Let fl be the set of most probable 
worlds of {Ai,E), let fl' be the set of most probable worlds 
of {M,E U {^yi}) and let H" be the set of most probable 
worlds of (A4, E) in which yi is false. In general H' A 
Either (Ad, E) Pmap (yi A • ■ • A y^) or at least one of 
must be nonempty. Let H", be such a nonempty set. It must 
hold (Ad,FU{-'yi.}) ^map (yi V-•-Vyi.-i Vyi.+i V-• • V 
yk) because, in this case, H', = H", C Ll. (ii) The second 
part of the lemma follows from repeated application of the 
first part. □ 

Proof of Lemma\^ (=^) It follows from Lemma[3]that any 
most probable model of F is a model of 0^ U E, hence if 
0* U F h V C then (Ad, F) Gmap V C. 

(<;=) If (Ad, F) Gmap (ci V • • • V Cm), then by Lemma|4]we 
have (Ad, F U C) Gmap cj for some j G {1,..., m} and 
C" C {-ici,...,-iCj-i,-iCj+i,...,-iCm}- Since |F| -f 

\C'\ < k, by Lemma|2]we have 0*1- <{f\ F)V-' (A C")V 

Cj and therefore 0*UFI-\/C'. □ 


Correctness of Transformation|4] 

In the following lemmas. Ad will be an MLN, k will be an 
integer, will be the ground possibilistic logic encod¬ 
ing given by Transformation [3 in which, for convenience, 
we replace any rule {/\E ^ /\X, Xe) by a set of rules 
(A F —7> x,Xe) where x G X. 0^ will be the non-ground 
possibilistic logic encoding given by Transformation]?] Re¬ 
call that 0^ is given together with a set of constants, typed 
according to their interchangeability. The ground possi¬ 
bilistic logic theory obtained by grounding 0^ using this 
set of typed constants will be denoted by 0^. 

Lemma 6. Let a = {/\E ^ x, Xe) be a rule. Then a G 
if and only if a G 0^- 

Proof. (^) If a G then a rule isomorphic to 

{variabilize {/\ E ^ x), Xe) must be contained in 0^ 
but then a must be among the groundings of that rule and 
therefore also be in 0^. 

(<^) If a G 0^ then 0^ must contain a 
rule {variabilize {/\ E'^ x'), Xe) isomorphic to 
{variabilize {/\ E ^ x), Xe) such that (Ad, F') Gmap x' 
and such that there is no literal y' G E' satisfying 
(Ad,F' \ {y'}) Gmap y'■ It follows from the fact that 
constants of the same type are interchangeable that then 
also (Ad,F) Gmap x and that there is no literal y G E 
satisfying (Ad, F \ {y}) Gmap y- From this it immediately 
follows that a G T^. □ 

Lemma 7. Let a = (“' (A F A A : ^e) ^ Then 

a G if and only if a G 0^- 

Proof. The proof of this lemma is very similar to the proof 
of Lemma ]6] 

(=^) If a G then a rule isomorphic to 

{variabilize {/\E A /\X) ,X'e) must be contained in 0^ 
but then a must be among the groundings of that rule and 
therefore also be in 0^. 

(<^=) If a G 0^ then 0^ must contain a rule 
{variabilize {/\E' A /\ X '), A^) isomorphic to 
{variabilize {/\E A /\x), X'e) such that (Ad, E') Gmap x' 
for all x' G X'. It follows from the fact that constants 
of the same type are interchangeable that then also 
(Ad, F) Gmap x for all x G X, from which it immediately 
follows that a G T^. □ 

Proposition 5. Given an MLN Ad and an integer k, the 
possibilistic logic theories obtained by Transformation ]5] 
and by grounding the possibilistic logic theory obtained by 
Transformation^are equivalent. 

Proof. The proof follows from Lemma ]6] and Lemma ]7] 

□ 




Correctness of the alternative to Transformation!!] 

We show the correctness of the alternative transformation, 
in which we replace rules of the form 

(^variabilize E A ^ ^ 

by the rule 

(^variabilize Ae),X'e) 

The correctness of this alternative transformation can be 
shown as follows. Let 0^ be the possibilistic logic theory 
obtained by Transformation|4|and let be the possibilistic 
logic theory obtained by the alternative transformation. It 
holds that /\E \= -^{/\E A /\X) and consequently also 
vanabilize{^ /\ E) ^ variabilize{^{/\E A/\X)). There¬ 
fore if 0^ U F is inconsistent then U F must be incon¬ 
sistent as well. Moreover, since < (j){E), we can show 
(using arguments analogical to those used in the proof of 
Proposition |3|l that this alternative transformation also sat- 
ishes the properties stated for Transformation|4|in Proposi¬ 
tion |3 

B ADDITIONAL EXAMPLE 

In this section we illustrate Transformation |4| on a larger 
MLN trained for predicting categories of computer science 
paperfl which is shown in Table (Tj This MLN contains 
rules for predicting categories of papers from categories of 
other papers which refer to them (rules 2-3) or from cat¬ 
egories of papers written by the same author (rule 3). In 
addition it contains rules giving prior probabilities of the 
individual probabilities and a hard rule specifying that ev¬ 
ery paper has at most one category (for simplicity). 

We applied Transformation |4| on this MLN which resulted 
in a possibilistic logic theory with 200 rule^ Table |2]d is- 
plays a subset of the rules in the resulting theory, which 
are interesting for illustrating some properties of the MLN 
and which are not immediately obvious from the MLN it¬ 
self. Notice that we represent, apart from the formulas of 
the form (fT4l i. we present formuals in the implication form 
to make the interpretation easier. 

Rule a is one of the rules enforcing drowning. Rule /3 states 
that in absence of other evidence, the category of any paper 

^We obtained the structure of the MLN from the Tuffy web 
http://i.Stanford.edu/hazy/tuffy/download/ 

®There are two modes for filtering redundant rules in the pos¬ 
sibilistic logic theory in our implementation. The more aggres¬ 
sive filtering iteratively removes rules which are entailed by other 
rules in the theory (with the same or greater certainty weights) 
whereas the less aggressive filtering only iteratively removes the 
rules which are entailed by subset of the rules from the theory 
which are all shorter or equally long. The latter filtering results 
in slightly more interpretable results. In the experiments reported 
here, we have used the less aggressive filtering. 


is assumed to be AI. Rule 7 cannot intuitively be justified 
and is probably an unintended consequence of the MLN. 
It states that if FI wrote a paper which is not from the 
category AI, then V 1 did not write any other paper. This 
rule is not harmful when the MLN is used for the purpose 
of predicting categories but it indicates that the MLN would 
not be suitable for predicting authorship (which is not really 
surprising given that the MLN was not trained for this task). 
The rules S and 77 are representatives of rules which capture 
the prior distribution of the categories (note that there are 
more rules of this kind which we do not show here). Rule 
C is similar to 7. The rules t, k. A, p, iz and ^ state that if 
one paper refers to another paper then they typically have 
the same category. Such rules actually give evidence of 
meaningfulness of the MLN for prediction of categories. 



1 —>wrote{per:A,pap:C) V —'wrote{per:A,pap:B) V category (pap :C, cat: D) V -<category{pap:B,cat:D) 

2 -^refers(pap:A,pap:B) V category(pap:B, cat:C) V -<category(pap:A, cat:C) 

2 -<refers(pap:A,pap:B) V category(pap:A, cat:C) V -<category(pap:B, cat:C) 

—3 category(pap:A, cat:net) 

0.14 category{pap:A,cat:prog) 

0.09 category(pap:A,cat:os) 

0.04 category(pap:A,cat:hwMrch) 

0.11 category(pap:A,cat:dsMlg) 

0.04 category(pap:A, cat:encjcompr) 

0.02 category(pap:A,cat:ir) 

0.05 category(pap:A,cat:db) 

0.39 category(pap:A,cat:ai) 

0.06 category(pap:A,cat:hci) 

0.06 category(pap:A,cat:net) 

oo eq{cat:B,cat:C) V -<category(pap:A,cat:C) V ^category(pap:A, cat:B) 


Table 1: Markov Logic Network for CORA. 


a = (-<category(pap:Vl,cat:prog), Xo) 

(3 = (—>■ category(pap:Vl, cat:ai), Xq) 

7 = (wrote(per:Vl,pap:V3) A -ieq(pap:V2,pap:V3) A ^category(pap:V2, cat:ai) -5- -<wrote(per:Vl,pap:V2), Ai) 

6 = (^category(pap:Vl,cat:ai) -5- category(pap:Vl,cat:prog), Xi) 
e = (-<category(pap:Vl, cat:dsMlgJh), Xi) 

C = (category(pap:V2,cat:dsjalg) Awrote(per:Vl,pap:V3) A -<eq(pap:V2,pap:V3) — ^ -^wrote(per:Vl,pap:V2), X 2 ) 
rj = (^category(pap:Vl, cat:ai) A ^category(pap:Vl, cat.prog) -5- category(pap:Vl, cat:dsMlg), X 2 ) 

0 = (-<category(pap:Vl,cat:os), X 2 ) 


i = (^eq(pap:Vl,pap:V2) A refers(pap:Vl,pap:V2) A -^category(pap:V2, cat:ai) -5- category(pap:Vl, cat.prog), As) 
K = (^eq(pap:Vl,pap:V2) A refers(pap:V2,pap:Vl) A -^category(pap:V2, cat:ai) -A category(pap:Vl, cat:prog), As) 


A = (^eq(pap:Vl,pap:V2) A refers(pap:V2,pap:Vl) A category(pap:V2,cat:dsMlg) —> 

category(pap:Vl, cat:dsMlg), An) 

p = (-^eq(pap:Vl,pap:V2) A refers(pap:V2,pap:Vl) A category(pap:Vl,cat:dsjalg) -A 

category(pap:V2, cat:dsMlg), An) 


V = (-<eq(pap:Vl,pap:V2) A refers(pap:V2,pap:Vl) A category(pap:Vl,cat:os) —> category(pap:V2,cat:os), Xu) 
^ = (-^eq(pap:Vl,pap:V2) A refers(pap:V2,pap:Vl) A category(pap:V2,cat:os) -A category(pap:Vl,cat:os), A 14 ) 


Table 2: Subset of the possibilistic logic theory obtained for the CORA MLN with maximum evidence set size k = 2. The 
different levels of the theory are separated by horizontal lines. 








